Analysis of Covariate Results

Author

Lauren Khoury

Published

May 27, 2025

Background

We ran batches of simulations to generate data, fit linear models, and extract the results.

In each simulation, we generated data with a dichotomous \(X\). We ran three batches, each of which had a population parameter for \(X\) set to 0, 0.3, or 0.5.

We manipulated the following variables:

Variable Description Values
n_obs Number of observations in a sample 100, 150, 200, 300, 400
n_covs Number of total available covariates 4, 8, 12, 16
p_good_covs Proportion of “good” covariates* 0.25, 0.50, 0.75
r_ycov Correlation between \(Y\) and covariates 0.3, 0.5
r_cov Correlation between the “good” covariates* 0.3
  • Note: here we define “good” covariates as ones that have a nonzero relationship with \(Y\)


We fully crossed all levels, yielding 120 unique research settings.
We used the following 7 methods to select covariates to include in a linear model:

  1. No covariates
  2. All covariates
  3. P-hacking
  4. R
  5. Partial R
  6. Full lm
  7. LASSO

We fit a linear model for each method and from the model output, extracted the estimate for \(X\), standard error of this estimate, and p-value of this \(X\) effect.

We repeated this 20,000 times for each research setting.

We present the results here.

Data Analysis

Glimpse data

There is one dataset for \(b_x = 0, 0.3, 0.5\). Each dataset has 16,800,000 observations = 120 unique settings \(\times\) 20,000 simulations each \(\times\) 7 methods

Data for \(b_x = 0\), as an example, is shown below.

Rows: 16,800,000
Columns: 16
$ job_num       <int> 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 10…
$ method        <fct> no_covs, all_covs, p_hacked, r, partial_r, full_lm, lass…
$ simulation_id <int> 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3,…
$ estimate      <dbl> 0.2158, 0.1872, 0.2218, 0.1812, 0.1812, 0.1812, 0.1872, …
$ SE            <dbl> 0.157, 0.151, 0.157, 0.151, 0.151, 0.151, 0.151, 0.156, …
$ p_value       <dbl> 0.171, 0.218, 0.159, 0.234, 0.234, 0.234, 0.217, 0.515, …
$ ndf           <int> 1, 5, 3, 2, 2, 2, 4, 1, 5, 4, 2, 2, 2, 3, 1, 5, 1, 2, 2,…
$ ddf           <int> 148, 144, 146, 147, 147, 147, 145, 148, 144, 145, 147, 1…
$ covs_tpr      <dbl> 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1,…
$ covs_fpr      <dbl> 0.000, 1.000, 0.667, 0.000, 0.000, 0.000, 0.667, 0.000, …
$ n_obs         <int> 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 150, 1…
$ b_x           <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ n_covs        <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
$ r_ycov        <dbl> 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0…
$ p_good_covs   <dbl> 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.…
$ r_cov         <dbl> 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0…

Zero \(X\) Effect

First, we look at the zero \(X\) effect condition to compare the Type I errors across methods and research settings.

Type I error

We will look at the overall Type I error across methods, then will compare the error of each method across each of the manipulated variables: n_obs, n_covs, p_good_covs, r_ycov, and r_cov.

by method

We will first consider the Type I error by the selection method. Here we calculate the proportion of significant effects (\(p < 0.05\)), the Type I error, displayed below as a bar plot.

From this plot, we can see that the p-hacking method leads to inflated Type I error rates. The no covariates, all covariates, and r approaches are all at the expected 0.05 mark, while partial r, full lm, and lasso show slight inflation, but are still relatively close.

Here we view the distributions of the Type I error rate by method, beginning by isolating the p-hacked method.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We see that while the average Type I error for the p-hacked method was 0.172, it reached as high as 0.408, further emphasizing the inflation of error.

Removing this invalid method, we can view the distributions of the remaining 6 methods.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

We see that no covariates, all covariates, and r selection methods show normal distributions centered around 0.05, while partial r, full lm, and lasso are slightly right-skewed, with full lm having the greatest skew.

by n_obs

We will view the Type I error rates of each method for the different levels of the number of observations in a sample. In the table below, we see the minimum, maximum, and average Type I error for each level of n_obs by each method.

Type I error by n_obs
method n_obs typeI_min typeI_max typeI_mean
no_covs 100 0.047 0.053 0.050
no_covs 150 0.046 0.052 0.049
no_covs 200 0.048 0.052 0.050
no_covs 300 0.047 0.053 0.050
no_covs 400 0.048 0.052 0.050
all_covs 100 0.048 0.054 0.050
all_covs 150 0.047 0.053 0.050
all_covs 200 0.048 0.053 0.050
all_covs 300 0.048 0.053 0.050
all_covs 400 0.047 0.054 0.050
p_hacked 100 0.078 0.396 0.179
p_hacked 150 0.075 0.402 0.174
p_hacked 200 0.072 0.405 0.172
p_hacked 300 0.074 0.408 0.168
p_hacked 400 0.070 0.399 0.165
r 100 0.048 0.053 0.050
r 150 0.048 0.054 0.049
r 200 0.048 0.053 0.050
r 300 0.048 0.054 0.050
r 400 0.047 0.053 0.050
partial_r 100 0.050 0.057 0.052
partial_r 150 0.048 0.056 0.051
partial_r 200 0.048 0.054 0.051
partial_r 300 0.049 0.054 0.051
partial_r 400 0.047 0.053 0.051
full_lm 100 0.051 0.086 0.061
full_lm 150 0.048 0.072 0.056
full_lm 200 0.049 0.067 0.055
full_lm 300 0.049 0.059 0.053
full_lm 400 0.048 0.057 0.052
lasso 100 0.050 0.065 0.057
lasso 150 0.049 0.063 0.054
lasso 200 0.049 0.057 0.053
lasso 300 0.049 0.056 0.052
lasso 400 0.048 0.056 0.051

Looking at the average column, we see that the Type I error is not affected much for the no covariates, all covariates, r, and partial r approaches across different sample sizes. However, for p-hacked, full lm, and lasso, the Type I error does decrease as sample size increases. This can be visualized in the plot below.

Again, we see the inflation of Type I error for p-hacking. We also see that lasso and full lm perform worse than the other methods for small sample sizes, but the methods become comparable as sample size increases.

by n_covs

We will view the Type I error rates of each method for the different number of available covariates (Note: this is not necessarily the number of covariates included in the model.). In the table below, we see the minimum, maximum, and average Type I error for each level of n_covs by each method.

Type I error by n_covs
method n_covs typeI_min typeI_max typeI_mean
no_covs 4 0.046 0.053 0.050
no_covs 8 0.047 0.052 0.049
no_covs 12 0.047 0.052 0.050
no_covs 16 0.048 0.053 0.050
all_covs 4 0.048 0.053 0.050
all_covs 8 0.047 0.054 0.050
all_covs 12 0.047 0.052 0.050
all_covs 16 0.048 0.054 0.050
p_hacked 4 0.070 0.139 0.096
p_hacked 8 0.096 0.236 0.147
p_hacked 12 0.118 0.332 0.199
p_hacked 16 0.142 0.408 0.244
r 4 0.048 0.054 0.050
r 8 0.048 0.053 0.050
r 12 0.047 0.053 0.050
r 16 0.048 0.054 0.050
partial_r 4 0.048 0.054 0.051
partial_r 8 0.048 0.054 0.051
partial_r 12 0.047 0.055 0.051
partial_r 16 0.049 0.057 0.052
full_lm 4 0.048 0.056 0.051
full_lm 8 0.049 0.061 0.054
full_lm 12 0.048 0.074 0.057
full_lm 16 0.050 0.086 0.060
lasso 4 0.048 0.054 0.051
lasso 8 0.049 0.056 0.053
lasso 12 0.048 0.061 0.054
lasso 16 0.049 0.065 0.056

Looking at the average column, we see that the Type I error is not affected much for the no covariates, all covariates, r, and partial r approaches across different amounts of available covariates. However, for p-hacked, full lm, and lasso, the Type I error increases as the number of covariates increases. This can be visualized in the plot below.

We see the increase in error for p-hacked, full lm, and lasso methods as the number of covariates increases, while the other methods stay around 0.05.

by p_good_covs

We will view the Type I error rates of each method for the different proportions of “good” covariates. In the table below, we see the minimum, maximum, and average Type I error for each level of p_good_covs by each method.

Type I error by p_good_covs
method p_good_covs typeI_min typeI_max typeI_mean
no_covs 0.25 0.047 0.052 0.050
no_covs 0.50 0.046 0.053 0.050
no_covs 0.75 0.047 0.053 0.050
all_covs 0.25 0.047 0.054 0.050
all_covs 0.50 0.048 0.054 0.050
all_covs 0.75 0.047 0.053 0.050
p_hacked 0.25 0.070 0.260 0.135
p_hacked 0.50 0.082 0.340 0.175
p_hacked 0.75 0.091 0.408 0.205
r 0.25 0.048 0.054 0.050
r 0.50 0.048 0.053 0.050
r 0.75 0.047 0.054 0.050
partial_r 0.25 0.049 0.057 0.052
partial_r 0.50 0.048 0.055 0.051
partial_r 0.75 0.047 0.054 0.050
full_lm 0.25 0.049 0.064 0.054
full_lm 0.50 0.048 0.076 0.055
full_lm 0.75 0.048 0.086 0.058
lasso 0.25 0.050 0.065 0.055
lasso 0.50 0.049 0.064 0.054
lasso 0.75 0.048 0.061 0.052

Looking at the average column, we see that the Type I error is not affected for the no covariates, all covariates, and r approaches across different amounts of proportions. However, for p-hacked, partial r, full lm, and lasso, the Type I error changes as the number of covariates increases. This can be visualized in the plots below.

In this plot, we mainly see the increase in Type I error for the p-hacking approach as the proportion of good covariates increases. However, we cannot see the trends of the other methods clearly, so we will plot this again without the p-hacked line.

In this plot, we see different trends across methods. As there are more good covariates, the Type I error decreases for lasso and partial r, but it increases for full lm.

by correlations

In these batches of simulations, we did not vary the correlation among the good covariates. We will look at the Type I error rates of each method by the correlation between \(Y\) and the good covariates.

Type I error by y-cov correlations
method r_ycov typeI_min typeI_max typeI_mean
no_covs 0.3 0.047 0.053 0.050
no_covs 0.5 0.046 0.053 0.050
all_covs 0.3 0.048 0.054 0.050
all_covs 0.5 0.047 0.054 0.050
p_hacked 0.3 0.070 0.188 0.127
p_hacked 0.5 0.081 0.408 0.217
r 0.3 0.048 0.054 0.050
r 0.5 0.047 0.054 0.050
partial_r 0.3 0.049 0.057 0.051
partial_r 0.5 0.047 0.056 0.051
full_lm 0.3 0.050 0.067 0.056
full_lm 0.5 0.048 0.086 0.055
lasso 0.3 0.049 0.065 0.054
lasso 0.5 0.048 0.063 0.053

Looking at the average column, we see that the error does not change across correlations for the no covariates, all covariates, r, and partial r approaches. It changes slightly for full lm and lasso. And it changes drastically when p-hacking, such that a higher correlation among good covariates increases the Type I error.

In the bar plot, we see the Type I error for the p-hacking method increase as the correlation between \(Y\) and the good covariates increases. We see the small fluctuations in Type I error for full lm and lasso and the larger increase in error for the p-hacking method.

Estimate, SD, & SE

Here we will compare, across methods, the estimate of \(b_x\), the standard deviation of the estimate, and the average standard error of the estimate. The standard deviation is calculated as the SD of the sampling distribution of the estimates. The standard error is from the linear model output. Since the mean of standard errors would be biased, we calculate the average SE by taking the square root of the mean of the squared standard errors. We compare the differences by subtracting this average linear model SE from the calculated SD.

b_x = 0
method mean_estimate SD_estimate SE_mean difference
no_covs 0 0.148 0.148 0.000
all_covs 0 0.124 0.124 0.000
p_hacked 0 0.188 0.131 0.057
r 0 0.121 0.121 0.000
partial_r 0 0.122 0.121 0.001
full_lm 0 0.125 0.122 0.004
lasso 0 0.123 0.121 0.002

We see that all methods have an average estimate of 0, as expected. We see that the standard deviation of the sampling distribution of the estimate equals the standard error for no covariates, all covariates, and r selection methods There are small differences between these values for partial r, full lm, and lasso methods. The p-hacking shows a large difference.

Sampling Distributions

Here we view a sampling distribution of the estimate for \(b_x\) for each method.

Warning: Removed 33586 rows containing non-finite outside the scale range
(`stat_density()`).

We see again that each method’s distribution is centered around 0, but the p-hacked method is not normally distributed as it is biasing the parameter estimates.

From these primary analyses, we see that the p-hacked method leads to inflated Type I error rates and biased parameter estimates. For the following analyses, we will not include the p-hacked method. While partial r, full lm, and lasso selection methods showed slight inflation of Type I error, we might be willing to accept this for greater reductions in Type II error, which we will compare in the next section.

Nonzero \(X\) Effect

Next, we look at the nonzero \(X\) effect condition to compare the Type II errors across methods and research settings. Recall that we set two values for \(b_x\) of 0.3 and 0.5.

Type II Error

We will look at the overall Type II error across methods (except p-hacked), then will again compare the error of each method across each of the manipulated variables: n_obs, n_covs, p_good_covs, r_ycov, and r_cov.

by method

We will first consider the Type II error by the selection method, for both \(b_x = 0.3\) and \(b_x = 0.5\). Here we calculate the proportion of non-significant effects (\(p \geq 0.05\)), the Type II error, displayed below as bar plots.

b_x = 0.3

In this first plot for \(b_x = 0.3\), we see that the Type II error is highest when no covariates are used in the model. There is a large reduction in Type II error when we include all covariates compared to no covariates, and a slight further reduction in Type II error when we use a selection method for covariates compared to including all.

b_x = 0.5

We see a similar trend here, with \(b_x = 0.5\), of a large reduction in Type II error when including all covariates compared to none, and another small reduction when selecting covariates.

In both cases, we see that the full lm method has the highest Type II error.

by n_obs

We will view the Type II error rates of each method for the different levels of the number of observations in a sample. In the tables below, we see the minimum, maximum, and average Type II error for each level of n_obs by each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).

b_x = 0.3
Type II error by n_obs, b_x = 0.3
method n_obs typeII_min typeII_max typeII_mean
no_covs 100 0.676 0.687 0.682
no_covs 150 0.547 0.562 0.554
no_covs 200 0.432 0.447 0.440
no_covs 300 0.259 0.269 0.265
no_covs 400 0.146 0.156 0.151
all_covs 100 0.305 0.674 0.556
all_covs 150 0.121 0.527 0.387
all_covs 200 0.044 0.405 0.266
all_covs 300 0.004 0.229 0.123
all_covs 400 0.000 0.121 0.056
r 100 0.287 0.668 0.539
r 150 0.113 0.521 0.375
r 200 0.041 0.401 0.258
r 300 0.004 0.228 0.120
r 400 0.000 0.119 0.054
partial_r 100 0.285 0.661 0.532
partial_r 150 0.112 0.516 0.372
partial_r 200 0.041 0.399 0.256
partial_r 300 0.004 0.226 0.119
partial_r 400 0.000 0.119 0.054
full_lm 100 0.335 0.660 0.537
full_lm 150 0.144 0.516 0.376
full_lm 200 0.051 0.399 0.259
full_lm 300 0.005 0.226 0.121
full_lm 400 0.000 0.119 0.055
lasso 100 0.281 0.659 0.527
lasso 150 0.113 0.517 0.370
lasso 200 0.042 0.399 0.255
lasso 300 0.004 0.227 0.119
lasso 400 0.000 0.118 0.054

In this table, we see that for all methods, as the sample size increases, the Type II error decreases. This can be better seen in a plot below.

Here, we see the decrease in Type II error for the increase in sample size. We see that the no covariates approach has the highest Type II error. From both the table and the plot, we see that for smaller sample sizes, including all covariates in the model yields higher Type II errors, but these become comparable for larger sample sizes.

b_x = 0.5
Type II error by n_obs, b_x = 0.5
method n_obs typeII_min typeII_max typeII_mean
no_covs 100 0.299 0.310 0.304
no_covs 150 0.134 0.146 0.141
no_covs 200 0.057 0.063 0.060
no_covs 300 0.008 0.011 0.009
no_covs 400 0.001 0.002 0.001
all_covs 100 0.017 0.292 0.173
all_covs 150 0.001 0.120 0.058
all_covs 200 0.000 0.044 0.019
all_covs 300 0.000 0.006 0.002
all_covs 400 0.000 0.001 0.000
r 100 0.013 0.281 0.157
r 150 0.000 0.117 0.053
r 200 0.000 0.042 0.017
r 300 0.000 0.005 0.002
r 400 0.000 0.000 0.000
partial_r 100 0.013 0.273 0.152
partial_r 150 0.000 0.114 0.051
partial_r 200 0.000 0.041 0.017
partial_r 300 0.000 0.005 0.002
partial_r 400 0.000 0.000 0.000
full_lm 100 0.035 0.273 0.162
full_lm 150 0.002 0.114 0.056
full_lm 200 0.000 0.041 0.018
full_lm 300 0.000 0.005 0.002
full_lm 400 0.000 0.000 0.000
lasso 100 0.014 0.274 0.152
lasso 150 0.000 0.116 0.051
lasso 200 0.000 0.042 0.017
lasso 300 0.000 0.006 0.002
lasso 400 0.000 0.001 0.000

We see lower overall Type II errors, but the same trend of decreasing errors for increasing sample sizes across all methods. We can view this in the plot below.

Similarly, the no covariates approach has the highest Type II error. For small sample sizes the all covariates approach has higher error which stabilizes with increasing sample size.

by n_covs

We will view the Type II error rates of each method for the different number of available covariates. In the tables below, we see the minimum, maximum, and average Type II error for each level of n_covs by each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).

b_x = 0.3
Type II error by n_covs, b_x = 0.3
method n_covs typeII_min typeII_max typeII_mean
no_covs 4 0.148 0.687 0.418
no_covs 8 0.149 0.683 0.419
no_covs 12 0.151 0.684 0.418
no_covs 16 0.146 0.685 0.418
all_covs 4 0.016 0.670 0.319
all_covs 8 0.004 0.665 0.280
all_covs 12 0.001 0.665 0.260
all_covs 16 0.000 0.674 0.251
r 4 0.015 0.668 0.316
r 8 0.004 0.652 0.274
r 12 0.001 0.642 0.250
r 16 0.000 0.642 0.237
partial_r 4 0.015 0.661 0.314
partial_r 8 0.004 0.640 0.271
partial_r 12 0.001 0.629 0.247
partial_r 16 0.000 0.627 0.234
full_lm 4 0.016 0.660 0.314
full_lm 8 0.004 0.641 0.273
full_lm 12 0.001 0.626 0.251
full_lm 16 0.000 0.629 0.241
lasso 4 0.016 0.659 0.314
lasso 8 0.004 0.636 0.271
lasso 12 0.001 0.619 0.245
lasso 16 0.000 0.611 0.230

From the average column, we can see that Type II error decreases as the number of covariates increases across all methods, except the no covariates method as this does not dependent on the number of covariates. We can see these trends in the plot below.

In this plot, we see the decrease in Type II error for increases in number of covariates. As the number of covariates increases, we see larger reductions in Type II error when selecting the covariates compared to including all available covariates. Lasso performs the best with higher numbers of covariates, followe by partial r, r, and full lm.

b_x = 0.5
Type II error by n_covs, b_x = 0.5
method n_covs typeII_min typeII_max typeII_mean
no_covs 4 0.001 0.310 0.103
no_covs 8 0.001 0.305 0.103
no_covs 12 0.001 0.307 0.103
no_covs 16 0.001 0.308 0.103
all_covs 4 0.000 0.286 0.059
all_covs 8 0.000 0.275 0.049
all_covs 12 0.000 0.283 0.046
all_covs 16 0.000 0.292 0.046
r 4 0.000 0.281 0.058
r 8 0.000 0.257 0.045
r 12 0.000 0.251 0.041
r 16 0.000 0.246 0.039
partial_r 4 0.000 0.273 0.057
partial_r 8 0.000 0.245 0.044
partial_r 12 0.000 0.236 0.039
partial_r 16 0.000 0.232 0.037
full_lm 4 0.000 0.273 0.058
full_lm 8 0.000 0.249 0.047
full_lm 12 0.000 0.246 0.044
full_lm 16 0.000 0.246 0.043
lasso 4 0.000 0.274 0.058
lasso 8 0.000 0.250 0.045
lasso 12 0.000 0.236 0.039
lasso 16 0.000 0.226 0.036

We see the same decrease in Type II errors for increases in number of covariates. We can visualize this in the plot below.

Similarly, we see the selection methods performing better than including all covariates. Again for higher numbers of covariates, we see lasso has the lowest Type II error, followed by partial r, r, and full lm.

by p_good_covs

We will view the Type II error rates of each method for the different proportions of “good” covariates. In the table below, we see the minimum, maximum, and average Type II error for each level of p_good_covs by each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).

b_x = 0.3
Type II error by p_good_covs, b_x = 0.3
method p_good_covs typeII_min typeII_max typeII_mean
no_covs 0.25 0.149 0.686 0.418
no_covs 0.50 0.148 0.687 0.419
no_covs 0.75 0.146 0.684 0.418
all_covs 0.25 0.010 0.674 0.317
all_covs 0.50 0.002 0.663 0.270
all_covs 0.75 0.000 0.648 0.245
r 0.25 0.009 0.668 0.306
r 0.50 0.002 0.647 0.262
r 0.75 0.000 0.639 0.240
partial_r 0.25 0.009 0.661 0.301
partial_r 0.50 0.002 0.640 0.259
partial_r 0.75 0.000 0.633 0.239
full_lm 0.25 0.009 0.660 0.301
full_lm 0.50 0.001 0.640 0.262
full_lm 0.75 0.000 0.635 0.246
lasso 0.25 0.009 0.659 0.300
lasso 0.50 0.001 0.641 0.259
lasso 0.75 0.000 0.634 0.237

From the average column, we see decreases in Type II error rates for increases in the proportion of good covariates across all methods – except no covariates, as this is independent of the proportion of good covariates. We can visualize the trends more clearly in the plot below.

In addition to the decrease in error mentioned above, we see that including all covariates has a higher Type II error than selection the covariates, especially for lower proportions of good covariates. We see that lasso has the lowest Type II error rate, moreso again for lower proportions of good covariates.

b_x = 0.5
Type II error by p_good_covs, b_x = 0.5
method p_good_covs typeII_min typeII_max typeII_mean
no_covs 0.25 0.001 0.310 0.103
no_covs 0.50 0.001 0.307 0.103
no_covs 0.75 0.001 0.307 0.103
all_covs 0.25 0.000 0.292 0.062
all_covs 0.50 0.000 0.270 0.047
all_covs 0.75 0.000 0.263 0.041
r 0.25 0.000 0.281 0.055
r 0.50 0.000 0.256 0.043
r 0.75 0.000 0.239 0.039
partial_r 0.25 0.000 0.273 0.053
partial_r 0.50 0.000 0.250 0.042
partial_r 0.75 0.000 0.234 0.038
full_lm 0.25 0.000 0.273 0.055
full_lm 0.50 0.000 0.254 0.046
full_lm 0.75 0.000 0.246 0.043
lasso 0.25 0.000 0.274 0.054
lasso 0.50 0.000 0.251 0.042
lasso 0.75 0.000 0.235 0.037

From the table, we see decreases in Type II error for higher proportions of good covariates for methods that do include covariates. We can see the details in the plot below.

We again see that including all covariates has a higher Type II error rate than selecting covariates to include, although this method improves for higher proportions of good covariates. Partial r performs best for a lower proportion of good covariates while lasso performs best for a higher proportion.

by n_good_covs

We can look at the interaction between the number of covariates and the proportion of good covariates to calculate the number of good covariates: \(n\_good\_covs = n\_covs * p\_good\_covs\). This represents the number of covariates in the model that have a nonzero relationship with \(Y\).

b_x = 0.3
Type II error by n_good_covs, b_x = 0.3
method n_good_covs typeII_min typeII_max typeII_mean
no_covs 1 0.149 0.686 0.418
no_covs 2 0.151 0.687 0.419
no_covs 3 0.148 0.682 0.417
no_covs 4 0.149 0.685 0.419
no_covs 6 0.149 0.683 0.419
no_covs 8 0.148 0.685 0.418
no_covs 9 0.151 0.684 0.419
no_covs 12 0.146 0.682 0.417
all_covs 1 0.070 0.670 0.358
all_covs 2 0.033 0.665 0.319
all_covs 3 0.016 0.665 0.292
all_covs 4 0.010 0.674 0.281
all_covs 6 0.003 0.649 0.249
all_covs 8 0.002 0.663 0.242
all_covs 9 0.001 0.645 0.228
all_covs 12 0.000 0.648 0.222
p_hacked 1 0.054 0.596 0.305
p_hacked 2 0.019 0.558 0.240
p_hacked 3 0.007 0.538 0.197
p_hacked 4 0.003 0.485 0.161
p_hacked 6 0.001 0.472 0.144
p_hacked 8 0.001 0.395 0.115
p_hacked 9 0.000 0.431 0.124
p_hacked 12 0.000 0.400 0.109
r 1 0.069 0.668 0.354
r 2 0.032 0.652 0.312
r 3 0.015 0.642 0.284
r 4 0.009 0.642 0.268
r 6 0.003 0.633 0.242
r 8 0.002 0.638 0.229
r 9 0.001 0.633 0.223
r 12 0.000 0.623 0.213
partial_r 1 0.068 0.661 0.351
partial_r 2 0.032 0.640 0.309
partial_r 3 0.015 0.633 0.280
partial_r 4 0.009 0.627 0.263
partial_r 6 0.003 0.626 0.240
partial_r 8 0.002 0.627 0.225
partial_r 9 0.001 0.629 0.222
partial_r 12 0.000 0.624 0.212
full_lm 1 0.068 0.660 0.351
full_lm 2 0.032 0.641 0.309
full_lm 3 0.016 0.635 0.281
full_lm 4 0.009 0.629 0.264
full_lm 6 0.003 0.626 0.244
full_lm 8 0.001 0.629 0.233
full_lm 9 0.001 0.624 0.230
full_lm 12 0.000 0.617 0.226
lasso 1 0.068 0.659 0.351
lasso 2 0.033 0.641 0.310
lasso 3 0.016 0.634 0.280
lasso 4 0.009 0.623 0.262
lasso 6 0.003 0.618 0.240
lasso 8 0.001 0.610 0.222
lasso 9 0.001 0.609 0.219
lasso 12 0.000 0.594 0.206

In the table, we see decreases in Type II error rates as the number of good covariates increases across methods that include covariates. We can further visualize this in the plot.

In the plot, we see the decreasing trend in Type II error. Including all covariates yields higher Type II error than selecting them. Lasso has the lowest Type II error, especially as the number of good covariates increases.

b_x = 0.5
Type II error by n_good_covs, b_x = 0.5
method n_good_covs typeII_min typeII_max typeII_mean
no_covs 1 0.001 0.310 0.104
no_covs 2 0.001 0.307 0.103
no_covs 3 0.001 0.301 0.102
no_covs 4 0.001 0.308 0.103
no_covs 6 0.001 0.305 0.103
no_covs 8 0.001 0.305 0.102
no_covs 9 0.001 0.307 0.104
no_covs 12 0.001 0.303 0.103
all_covs 1 0.000 0.286 0.075
all_covs 2 0.000 0.275 0.059
all_covs 3 0.000 0.283 0.052
all_covs 4 0.000 0.292 0.051
all_covs 6 0.000 0.260 0.041
all_covs 8 0.000 0.270 0.043
all_covs 9 0.000 0.243 0.038
all_covs 12 0.000 0.263 0.039
p_hacked 1 0.000 0.222 0.056
p_hacked 2 0.000 0.191 0.037
p_hacked 3 0.000 0.170 0.027
p_hacked 4 0.000 0.138 0.020
p_hacked 6 0.000 0.129 0.018
p_hacked 8 0.000 0.093 0.013
p_hacked 9 0.000 0.109 0.015
p_hacked 12 0.000 0.097 0.013
r 1 0.000 0.281 0.072
r 2 0.000 0.257 0.056
r 3 0.000 0.251 0.048
r 4 0.000 0.246 0.044
r 6 0.000 0.235 0.038
r 8 0.000 0.231 0.036
r 9 0.000 0.223 0.035
r 12 0.000 0.231 0.035
partial_r 1 0.000 0.273 0.070
partial_r 2 0.000 0.250 0.054
partial_r 3 0.000 0.236 0.046
partial_r 4 0.000 0.226 0.041
partial_r 6 0.000 0.225 0.037
partial_r 8 0.000 0.221 0.035
partial_r 9 0.000 0.220 0.035
partial_r 12 0.000 0.232 0.035
full_lm 1 0.000 0.273 0.070
full_lm 2 0.000 0.254 0.055
full_lm 3 0.000 0.245 0.048
full_lm 4 0.000 0.244 0.044
full_lm 6 0.000 0.246 0.042
full_lm 8 0.000 0.244 0.042
full_lm 9 0.000 0.240 0.041
full_lm 12 0.000 0.246 0.042
lasso 1 0.000 0.274 0.071
lasso 2 0.000 0.251 0.055
lasso 3 0.000 0.236 0.047
lasso 4 0.000 0.227 0.042
lasso 6 0.000 0.223 0.037
lasso 8 0.000 0.210 0.034
lasso 9 0.000 0.208 0.033
lasso 12 0.000 0.207 0.031

In the table, we see the Type II error decreasing as the number of good covariates increases. We can get a more nuanced view in the plot below.

Similarly, we see lasso performing best for higher numbers of good covariates. For smaller numbers of good covariates, partial r and lasso perform comparably well.

by correlations

We will look at the Type II error rates of each method by the correlation between \(Y\) and the good covariates, for both \(b_x = 0.3\) and \(b_x = 0.5\).

b_x = 0.3
Type II error by correlations, b_x = 0.3
method r_ycov r_cov typeII_min typeII_max typeII_mean
no_covs 0.3 0.3 0.149 0.685 0.418
no_covs 0.5 0.3 0.146 0.687 0.419
all_covs 0.3 0.3 0.076 0.674 0.364
all_covs 0.5 0.3 0.000 0.618 0.191
r 0.3 0.3 0.073 0.668 0.356
r 0.5 0.3 0.000 0.610 0.183
partial_r 0.3 0.3 0.073 0.661 0.352
partial_r 0.5 0.3 0.000 0.606 0.181
full_lm 0.3 0.3 0.084 0.660 0.355
full_lm 0.5 0.3 0.000 0.606 0.184
lasso 0.3 0.3 0.072 0.659 0.349
lasso 0.5 0.3 0.000 0.605 0.181

In the table, we see that the Type II error decreases as the correlation between \(Y\) and the good covariates increases. The Type II error is highest for including no covariates. We can visualize this below.

In the plots, we see that the no covariates method has the highest Type II error across correlation levels. The Type II errors decrease for the higher correlation between \(Y\) and the good covariates. We also see slight decreases in Type II error from including all covariates to selecting them.

b_x = 0.5
Type II error by correlations, b_x = 0.5
method r_ycov typeII_min typeII_max typeII_mean
no_covs 0.3 0.001 0.310 0.103
no_covs 0.5 0.001 0.308 0.103
all_covs 0.3 0.000 0.292 0.080
all_covs 0.5 0.000 0.203 0.021
r 0.3 0.000 0.281 0.073
r 0.5 0.000 0.191 0.018
partial_r 0.3 0.000 0.273 0.071
partial_r 0.5 0.000 0.189 0.018
full_lm 0.3 0.000 0.273 0.076
full_lm 0.5 0.000 0.188 0.019
lasso 0.3 0.000 0.274 0.070
lasso 0.5 0.000 0.190 0.018

From the table, we see that the Type II error decreases as the correlation between \(Y\) and the good covariates increases.

In the plots, we see again that the no covariates approach has the highest Type II error across correlations. There is a slight decrease in Type II error when selecting covariates instead of using all covariates. Among the selection methods, full lm has the higher Type II error, but only by a small amount. The Type II error rates are lower across all methods when the correlation between \(Y\) and the good covariates is higher.

Estimate, SD, & SE

Here we will compare, across methods, the estimate of \(b_x\), the standard deviation of the estimate, and the average standard error of the estimate. The standard deviation is calculated as the SD of the sampling distribution of the estimates. The standard error is from the linear model output. Since the mean of standard errors would be biased, we calculate the average SE by taking the square root of the mean of the squared standard errors. We compare the differences by subtracting this average linear model SE from the calculated SD.

b_x = 0.3

Estimate, SD, SE: b_x = 0.3
method mean_estimate SD_estimate mean_SE difference
no_covs 0.300 0.148 0.148 0.000
all_covs 0.300 0.124 0.124 0.000
r 0.299 0.121 0.121 0.000
partial_r 0.300 0.122 0.121 0.001
full_lm 0.300 0.125 0.122 0.004
lasso 0.300 0.123 0.120 0.002

Here we see that all methods correctly estimate \(b_x\) to be 0.3, except the r approach which yielded a slightly lower average estimate. The no covariates, all covariates, and r approaches show no difference between the calculated SD and the linear model SE, while partial r, full lm, and lasso approaches show slight differences.

b_x = 0.5

Estimate, SD, SE: b_x = 0.5
method mean_estimate SD_estimate mean_SE difference
no_covs 0.500 0.148 0.148 0.000
all_covs 0.500 0.124 0.124 0.000
r 0.498 0.121 0.121 0.000
partial_r 0.500 0.122 0.121 0.001
full_lm 0.500 0.125 0.122 0.004
lasso 0.500 0.123 0.120 0.002

Similarly, we see all methods correctly estimate \(b_x\) to be 0.5, except the r approach which again yielded a slightly lower estimate. The no covariates, all covariates, and r approaches show no difference between the calculated SD and the linear model SE, while partial r, full lm, and lasso approaches show slight differences.

Sampling Distributions

Here we view sampling distributions of the estimate for \(b_x\) for each method, for both \(b_x = 0.3\) and \(b_x = 0.5\).

b_x = 0.3

Warning: Removed 19433 rows containing non-finite outside the scale range
(`stat_density()`).

In the plot, we can see that the distributions for all methods are centered around 0.3. The no covariates approach has the widest distribution.

b_x = 0.5

Warning: Removed 19514 rows containing non-finite outside the scale range
(`stat_density()`).

In the plot, we can see that the distributions for all methods are centered around 0.5. The no covariates approach has the widest distribution.

Conclusions

We compared 7 methods for selecting covariates to include in linear models. In the first section looking at Type I errors, we demonstrated that the p-hacking approach is not a statistically valid method as it led to inflated Type I error rates and biased parameter estimates. The remaining 6 methods were all shown to be statistically valid, and can be further compared by their Type II error results. Overall, using no covariates performed the worst as it led to the highest Type II error. Including all covariates led to reductions in Type II error, and using one of the selection methods led to further reductions in Type II error. A comparison of the selection methods across different research settings, showed they yielded similar Type II errors. However, for larger numbers of covariates and larger proportions of good covariates, lasso and partial r did have the lowest Type II errors.